Improved performance of sequence search algorithms in remote homology detection
نویسندگان
چکیده
The protein sequence space is vast and diverse, spanning across different families. Biologically meaningful relationships exist between proteins at superfamily level. However, it is highly challenging to establish convincing relationships at the superfamily level by means of simple sequence searches. It is necessary to design a rigorous sequence search strategy to establish remote homology relationships and achieve high coverage. We have used iterative profile-based methods, along with constraints of sequence motifs, to specify search directions. We address the importance of multiple start points (queries) to achieve high coverage at protein superfamily level. We have devised strategies to employ a structural regime to search sequence space with good specificity and sensitivity. We employ two well-known sequence search methods, PSI-BLAST and PHI-BLAST, with multiple queries and multiple patterns to enhance homologue identification at the structural superfamily level. The study suggests that multiple queries improve sensitivity, while a pattern-constrained iterative sequence search becomes stringent at the initial stages, thereby driving the search in a specific direction and also achieves high coverage. This data mining approach has been applied to the entire structural superfamily database.
منابع مشابه
Combining Three Scoring Algorithms for Representing Protein Sequence
Effective representation of the protein sequence is a key issue in detecting remote protein homology. Recent work using string kernels for protein data has achieved state-of-the-art performance for protein classification. However, such representations are suffering from high dimensionality problem. In this work, we introduce a simple method based on representing the protein sequence by fix dime...
متن کاملRevisiting amino acid substitution matrices for identifying distantly related proteins
MOTIVATION Although many amino acid substitution matrices have been developed, it has not been well understood which is the best for similarity searches, especially for remote homology detection. Therefore, we collected information related to existing matrices, condensed it and derived a novel matrix that can detect more remote homology than ever. RESULTS Using principal component analysis wi...
متن کاملAN IMPROVED CHARGED SYSTEM SEARCH FOR STRUCTURAL DAMAGE IDENTIFICATION IN BEAMS AND FRAMES USING CHANGES IN NATURAL FREQUENCIES
It is well known that damaged structural members may alter the behavior of the structures considerably. Careful observation of these changes has often been viewed as a means to identify and assess the location and severity of damages in structures. Among the responses of a structure, natural frequencies are both relatively easy to obtain and independent from external excitation, and therefore, ...
متن کاملProfile-based direct kernels for remote homology detection and fold recognition
MOTIVATION Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS We...
متن کاملRemote homology detection of integral membrane proteins using conserved sequence features.
Compared with globular proteins, transmembrane proteins are surrounded by a more intricate environment and, consequently, amino acid composition varies between the different compartments. Existing algorithms for homology detection are generally developed with globular proteins in mind and may not be optimal to detect distant homology between transmembrane proteins. Here, we introduce a new prof...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2 شماره
صفحات -
تاریخ انتشار 2013